Restaurant Review

Top 10 cuisine in the whole NYC and by boro

As a real international metropolitan, There is a total of 80 different cuisine types in NYC based on our dataset and we firstly plot the 10 most frequently shown cuisine types in NYC.

<<<<<<< HEAD
=======
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7

From the plot above, the top 10 cuisine types across NYC include American, Chinese, Coffee/Tea, Pizza, Bakery Products/Desserts, Mexican, Japanese, Italian, latin American, Caribbean, which implies the racial diversity of NYC. American, Chinese, Coffee/Tea are the top 3 favorite cuisine types for NY citizens. We then inspect the top 10 frequently shown cuisine types in different boroughs.

## `summarise()` has grouped output by 'boro'. You can override using the
## `.groups` argument.
<<<<<<< HEAD

The top 3 frequently shown cuisine types in Manhattan and Brooklyn are American, Coffee/Tea, and Chinese. In Queens, Chinese, Latin American, and American are the top 3 preferred cuisine types. This make sense, since Queens has the largest Asian American population by county outside the Western United States. In Bronx, they are pizza, American, and Chinese. In Staten Island, American, Donuts, and Pizza are more liked by local citizens.

=======

The top 3 frequently cuisine types in Manhattan and Brooklyn are American, Coffee/Tea, and Chinese.

>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7

price vs cuisine

price_cuisine<-
  inspection_raw %>% 
  select(dba,boro,cuisine_description,critical_flag,score,grade,grade_date,inspection_type,latitude,longitude,rating,review_num,price) %>%
  drop_na(boro,price) %>% 
  mutate(price=as.factor(price)) %>% 
  group_by(price)%>%
  count(cuisine_description) %>%
  mutate(cuisine_description=fct_reorder(cuisine_description,n)) %>%
  filter(min_rank(desc(n))<=10) %>%
  ggplot(aes(x=cuisine_description,y=n,fill=price))+
  geom_bar(position="dodge", stat="identity")+
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1))+
  theme(axis.text.x = element_text(size = 5))+
  labs(
    x="Cuisine type",
    y="Number")+
  facet_grid(~price)
ggplotly(price_cuisine)
<<<<<<< HEAD

In the price level $, the top 3 frequently shown cuisine types are Chinese, Pizza, and American. In the price level $$, the top 3 frequently shown cuisine types are American, Chinese, and Coffee/Tea. In the price level $$$, the top 3 cuisine types are American, italian, and Chinese. In the price level $$$$, the mainstreams become Japanese and American. There are only few restaurants labeled as $$$$ in our dataset, which kind of contradicts the consensus that there are lots of fancy and expensive restaurants in NYC. This might be due to the limits of our data source since our datasets are selected and merged from restaurants that are under inspection or have been in inspection and from restaurants that can be searched from Yelp. Really expensive and gorgeous restaurants might not need inspection and not be searched from Yelp.

=======
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7

Price vs. borough

price_boro<-
  inspection_raw %>% 
  select(dba,boro,cuisine_description,critical_flag,score,grade,grade_date,inspection_type,latitude,longitude,rating,review_num,price) %>%
  drop_na(boro,price) %>%
  mutate(boro = fct_infreq(boro),
         price=as.factor(price)) %>%
  ggplot(aes(x = boro, fill = price)) + 
  geom_bar()
ggplotly(price_boro)
<<<<<<< HEAD

The above plot shows the number of restaurants at each price level in different boroughs. The proportions of restaurants at price level $$$ and $$$$ are so tiny among all the boroughs. The main stream in Manhattan is $$, while in the rest of boroughs.

=======
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7

review number vs. rating score (exclude extreme outliers)

review_num_rating<-
  inspection_raw %>% 
  select(dba,boro,cuisine_description,critical_flag,score,grade,grade_date,inspection_type,latitude,longitude,rating,review_num,price) %>% drop_na(boro,price,rating) %>%
  filter(!review_num<=100) %>%
  group_by(rating) %>%
  summarize(sum_review=sum(review_num)) %>% 
  ggplot(aes(y=sum_review,x=rating))+
  geom_point()+
  geom_smooth()+
  labs(
    x="Sum of review_numbers",
    y="Rating"
  )
ggplotly(review_num_rating)
## `geom_smooth()` using method = 'loess' and formula = 'y ~ x'
<<<<<<< HEAD

We are very curious about the relationship between rating and review numbers, so we make this plot to try to discover any latent association. As it showed above, the plot is seriously left-skewed. For most of rating lower than 3.0, the sum of review numbers are lower than 10000, which means for restaurants which gets low rates, the review numbers tend to be small. Meanwhile, it gives certain inspection about the model-building part, when using review numbers as a predictor.

=======
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7

Review Scores & Price

inspection_raw %>% 
  filter(!is.na(price)) %>%
  ggplot(aes(x = price, y = rating, fill = price), data = .)+geom_boxplot()+labs(title = "Yelp Rating vs. Price")

inspection_raw %>% 
  filter(!is.na(price)) %>%
  ggplot(aes(y = rating, fill = price), data = .)+geom_density()+labs(title = "Density Plot")+facet_grid(~price)

There seems to be a positive relationship between the cost of dining and yelp rating of NYC restaurants. The distribution of restaurant review scores appears to be right-skewed, probably due to the presence of outliers with a much lower value compared to the majority of the data.

Geolocation of Restaurants by Price

This map displays the geographical locations of the restaurants in the dataset. The map is interactive and is split on the costs of dining.

price_rating_map = inspection_raw %>%
  mutate(text_label = str_c("Price: ", price, "\nRating: ", rating)) %>% 
  drop_na(price) %>%
  plot_mapbox(
    lat = ~latitude,
    lon = ~longitude,
    mode = "markers",
    split = ~price,
    mode = "markers",
    hovertext = ~text_label) %>%
  layout(
    mapbox = list(
      style = 'dark',
      zoom =12.5,
      center = list(lat = 40.71, lon = -73.98))) 
price_rating_map %>% config(mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"))
## Warning: Ignoring 8 observations
<<<<<<< HEAD
=======
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7

In general, Manhattan has the largest number of restaurants and a much denser distribution compared to other boroughs. The proportion of the least and second least expensive restaurants are much higher compared to the proportions of more expensive dining places in Bronx, Queens, Brooklyn, and Staten Island. In addition, the majority of the restaurants that fall into the most expensive category are located in Manhattan.

cuisine map

price_rating_map = inspection_raw %>%
  mutate(text_label = str_c("Cuisine", cuisine_description)) %>% 
  drop_na(price) %>%
  plot_mapbox(
    lat = ~latitude,
    lon = ~longitude,
    mode = "markers",
    split = ~cuisine_description,
    mode = "markers",
    hovertext = ~text_label) %>%
  layout(
    mapbox = list(
      style = 'dark',
      zoom =12.5,
      center = list(lat = 40.71, lon = -73.98))) 
price_rating_map %>% config(mapboxAccessToken = Sys.getenv("MAPBOX_TOKEN"))
<<<<<<< HEAD
=======
## Warning: Ignoring 8 observations
>>>>>>> 90202168f5d2f3ffa4cb69c3f3c7648a17fff9d7